计算机与现代化 ›› 2010, Vol. 1 ›› Issue (8): 8-10.doi: 10.3969/j.issn.1006-2475.2010.08.003

• 算法设计与分析 • 上一篇    下一篇

决策树分类算法C4.5中连续属性过程处理的改进

李慧慧1,万武族2   

  1. 1.贵州人民武装学院信息工程系,贵州 贵阳 550025; 2.贵州大学计算机科学系,贵州 贵阳 550025
  • 收稿日期:2010-05-19 修回日期:1900-01-01 出版日期:2010-08-27 发布日期:2010-08-27

Improvement of Continuous Variables Processing with C4.5 Algorithm

LI Hui-hui1, WAN Wu-zu2   

  1. 1.Department of Information Engineering, The People’s Armed College of Guizhou, Guiyang 550025, China;2.Department of Computer Science, Guizhou University, Guiyang 550025, China
  • Received:2010-05-19 Revised:1900-01-01 Online:2010-08-27 Published:2010-08-27

摘要: 策树分类算法C4.5是数据挖掘中最常用、最经典的分类算法。但是C4.5算法也存在一些不足之处,针对C4.5算法处理连续属性比较耗时的特点,本文对连续的处理过程进行改进,以提高算法的计算效率。改进的C4.5算法与原C4.5算法相比,在构造决策树时具有相同的准确率和更高的计算速度。

关键词: 数据挖掘, 决策树, C4.5算法, 连续属性

Abstract: The decision tree classification algorithm C4.5 is the most popular and classical classification algorithm in the data mining. But, there are some defects in it, the processing of continuous variables in the C4.5 algorithm consumes too much time, according to this characteristic, the paper improves the processing of continuous variables to enhance the efficiency of the algorithm. The improved algorithm has better efficiency and has the same accuracy comparing with the C4.5 algorithm when building decision tree. 

Key words: data mining, decision tree, C4.5 algorithm, continuous variables